Add snapshot golden tests#1446
Conversation
db6f7fb to
b5a0b52
Compare
e8bcea9 to
0115ba2
Compare
22a0cff to
0658445
Compare
There was a problem hiding this comment.
Pull request overview
Adds snapshot “golden” testing and ABI tripwires to detect breaking changes in the on-disk guest↔host snapshot format, with CI automation to pull/regenerate published baselines from GHCR and validate cross-CPU portability.
Changes:
- Introduces a custom-harness
snapshot_goldenstest binary (libtest-mimic) plus fixtures/checks for snapshot load/round-trip validation against staged OCI image layouts. - Adds compile-time snapshot ABI tripwires pinning key format-defining constants and struct/layout offsets.
- Extends CI/Justfile tooling to pull/publish/regenerate goldens via GHCR (oras), including a
regen-goldenslabel-controlled path and cross-CPU verification.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/tests/rust_guests/simpleguest/src/main.rs | Adds guest/host echo and round-trip functions plus heap-pattern helpers used by golden fixtures/checks. |
| src/hyperlight_host/tests/snapshot_goldens/platform.rs | Detects platform (hypervisor/cpu/profile) and builds golden tag names for staging/verification. |
| src/hyperlight_host/tests/snapshot_goldens/oci.rs | Locates the staged OCI image-layout directories under target/snapshot-goldens/. |
| src/hyperlight_host/tests/snapshot_goldens/main.rs | Implements the custom harness entrypoint for verify and generate subcommands. |
| src/hyperlight_host/tests/snapshot_goldens/goldens_version.rs | Defines GOLDENS_VERSION and compat version set for multi-major verification. |
| src/hyperlight_host/tests/snapshot_goldens/fixtures.rs | Defines canonical sandbox config + deterministic mutation sequence for golden generation. |
| src/hyperlight_host/tests/snapshot_goldens/checks.rs | Adds independent checks that load goldens and validate captured state + ABI wire formats. |
| src/hyperlight_host/tests/integration_test.rs | Adjusts heap sizing in a couple tests to account for added guest-function registration overhead. |
| src/hyperlight_host/src/sandbox/snapshot/tripwires.rs | Adds const-eval assertions pinning ABI-critical constants, offsets, and discriminants. |
| src/hyperlight_host/src/sandbox/snapshot/mod.rs | Wires the new tripwires module into the snapshot module. |
| src/hyperlight_host/src/sandbox/snapshot/file/mod.rs | Exposes snapshot format constants to support tripwires. |
| src/hyperlight_host/src/sandbox/snapshot/file/media_types.rs | Tightens docs/visibility for snapshot media-type and ABI version constants. |
| src/hyperlight_host/src/sandbox/snapshot/file/config.rs | Adds targeted load-validation tests and JSON schema “pinned” round-trip tests. |
| src/hyperlight_host/src/sandbox/snapshot/file_tests.rs | Adds tests for from_snapshot config behavior and init-data permission round-tripping. |
| src/hyperlight_host/Cargo.toml | Adds libtest-mimic and registers the snapshot_goldens custom harness test target. |
| Justfile | Adds snapshot-goldens pull/generate/verify recipes and ensures clippy covers the custom harness target. |
| docs/snapshot-versioning.md | Documents snapshot format versioning, enforcement mechanisms, and regen/publish workflows. |
| docs/github-labels.md | Documents the regen-goldens workflow-affecting label. |
| Cargo.lock | Locks new dependencies introduced for the golden test harness. |
| .github/workflows/ValidatePullRequest.yml | Adds label-driven mode selection and introduces cross-CPU verification for regen path; disables fail-fast. |
| .github/workflows/RegenSnapshotGoldens.yml | Adds a workflow to generate/publish versioned golden OCI layouts (with completion marker) to GHCR. |
| .github/workflows/dep_snapshot_cross_verify.yml | Adds reusable workflow to verify regenerated goldens across CPU vendors. |
| .github/workflows/dep_build_test.yml | Adds pull-vs-regen golden verification steps (oras + just recipes) and uploads regen artifacts. |
danbugs
left a comment
There was a problem hiding this comment.
Mostly LGTM, just 2 minor comments--great work!! Let's just get 1 more person to look at this as it is a pretty big PR.
| {{ cargo-cmd }} test -p hyperlight-component-util --profile={{ if target == "debug" { "dev" } else { target } }} {{ target-triple-flag }} --test wasmtime_guest_codegen | ||
|
|
||
| @# run the rest of the integration tests | ||
| {{ cargo-cmd }} test -p hyperlight-host {{ if features =="" {''} else if features=="no-default-features" {"--no-default-features" } else {"--no-default-features -F " + features } }} --profile={{ if target == "debug" { "dev" } else { target } }} {{ target-triple-flag }} --test '*' |
There was a problem hiding this comment.
This means that every test must be added here too or it won't run in CI, no?
There was a problem hiding this comment.
yeah correct. Not ideal but I'm not sure how to fix it. My main concern is that I don't want cargo test to require users to pull OCI images in order to pass.
There was a problem hiding this comment.
fixed! Found a better way to do it. The custom harness noop-passes without a flag
syntactically
left a comment
There was a problem hiding this comment.
I had time only for a pretty quick skim over this.
add compile-time tripwires that pin every value defining the format (SNAPSHOT_ABI_VERSION, the media-type strings, the OCI layout version, HyperlightPEB field offsets and size, and the OutBAction ports), helping develpers catch
Why are the PEB fields important? I think the are used only during creation of the original snapshot from a binary? If they are not important I would like to be careful not to constrain on them for no reason.
make sure snapshots taken on AMD are usable on Intel, and vice versa
Are we certain already that this is a goal? I thought we purposefully punted on this and compatibility across hypervisors until we had a better sense of whether it would be onerous.
| `VmAction` port numbers (the I/O ports the guest writes to for `Log`, | ||
| `CallFunction`, `Abort`, `DebugPrint`, and `Halt`), the layout of the | ||
| sandbox memory regions | ||
| (stack, heap, guest binary, input and output buffers, page tables), |
There was a problem hiding this comment.
Some of these regions the host does not care about the layout of anymore (e.g. stack).
Also, this list should include the scratch bookkeeping area.
|
|
||
| install-flatbuffers-with-vcpkg: install-vcpkg | ||
| cd ../vcpkg && ./vcpkg install flatbuffers || cd - | ||
|
|
There was a problem hiding this comment.
Can we have test-like-ci run these like CI does please?
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
2219634 to
9ed1e1b
Compare
This PR adds tests to help notice breaking changes in the guest<->host ABI. This matters because any change there breaks every snapshot already persisted to disk (bad!).
When a change to the format is intentional, the author bumps the version and applies the
regen-goldenslabel. A workflow then regenerates the goldens against the branch and pushes the new set to GHCR.See docs/snapshot-versioning.md for details.